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Abstract 

We discuss the content and significance of John von Neumann's quantum er- 
godic theorem (QET) of 1929, a strong result arising from the mere mathematical 
structure of quantum mechanics. The QET is a precise formulation of what we 
call normal typicality, i.e., the statement that, for typical large systems, every 
initial wave function from an energy shell is "normal": it evolves in such a 
way that \rft){il>t\ is, for most t, macroscopically equivalent to the micro-canonical 
density matrix. The QET has been mostly forgotten after it was criticized as a 
dynamically vacuous statement in several papers in the 1950s. However, we point 
out that this criticism does not apply to the actual QET, a correct statement of 
which does not appear in these papers, but to a different (indeed weaker) state- 
ment. Furthermore, we formulate a stronger statement of normal typicality, based 
on the observation that the bound on the deviations from the average specified 
by von Neumann is unnecessarily coarse and a much tighter (and more relevant) 
bound actually follows from his proof. 
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1 Introduction 



Quantum statistical mechanics has many similarities to the classical version, and also 
some differences. Two facts true in the quantum but not in the classical case, canonical 
typicality and (what we call) normal typicality, follow from just the general mathematical 
structure of quantum mechanics. Curiously, both were discovered early on in the his- 
tory of quantum mechanics, in fact both in the 1920s, and subsequently forgotten until 
recently. Canonical typicality was basically anticipated, though not clearly articulated, 
by Schrodinger in 1927 [27], and rediscovered a few years ago by several groups inde- 
pendently [9l [HJ [23] . Normal typicality, the topic of this paper, was discovered, clearly 
articulated, and rigorously proven by John von Neumann in 1929 [31] as a "quantum 
ergodic theorem" (QET). In the 1950s, though, the QET was heavily criticized in two 
influential papers [7J Q] as irrelevant to quantum statistical mechanics, and indeed as 
dynamically vacuous. The criticisms (repeated in [21 El El [HI [15]) have led many to 
dismiss von Neumann's QET (e.g., [H], [111 P- 273], [21], p3], [21], [23 p. 227]). We 
show here that these criticisms are invalid. They actually apply to a statement different 
from (indeed weaker than) the original theorem. The dismissal of the QET is therefore 
unjustified. Furthermore, we also formulate two new statements about normal typical- 
ity, see Theorem [2] and Theorem [3] below, which in fact follow from von Neumann's 
proof. (We provide further discussion of von Neumann's QET article in a subsequent 
work [12].) 

In recent years, there has been a renewed strong interest in the foundations of quan- 
tum statistical mechanics, see [H HH [2S1 1251 12H [111 EED]; von Neumann's work, which 
has been mostly forgotten, has much to contribute to this topic. 

The QET concerns the long-time behavior of the quantum state vector 

ipt = exp(-iHt)ip (1) 

(where we have set H — 1) of a macroscopic quantum system, e.g., one with more 
than 10 20 particles, enclosed in a finite volume. Suppose that ip t belongs to a "micro- 
canonical" subspace Jif of the Hilbert space ^totai, corresponding to an energy interval 
that is large on the microscopic scale, i.e., contains many eigenvalues, but small on the 
macroscopic scale, i.e., different energies in that interval are not discriminated macro- 
scopically. Thus, the dimension of Jrff is finite but huge, in fact exponential in the 
number of particles. We use the notation 

D = dim Jif (2) 

(= S a in |3~T] . 5* in [HI]). The micro-canonical density matrix p mc is then 1/D times 
the identity operator on Jtf 7 , and the micro-canonical average of an observable A on J$? 
is given by 

tiA 

tr(pmcA) = — = E(<p\A\<p) , (3) 
where tp is a random vector with uniform distribution over the unit sphere of 

{ve,^ \ |M| = i}, (4) 
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and E means expectation value. In the following, we denote the time average of a 
function f(t) by a bar, 



1 



T 



f(t)= lim - / dtf{t). (5) 

Despite the name, the property described in the QET is not precisely analogous to the 
standard notion of ergodicity as known from classical mechanics and the mathematical 
theory of dynamical systems. That is why we prefer to call quantum systems with the 
relevant property "normal" rather than "ergodic." Nevertheless, to formulate a quantum 
analog of ergodicity was von Neumann's motivation for the QET. It is characteristic of 
ergodicity that time averages coincide with phase-space averages. Put differently, letting 
X t denote the phase point at time t of a classical Hamiltonian system, 5x t the delta 
measure concentrated at that point, and fi mc the micro-canonical (uniform) measure on 
an energy surface, ergodicity is equivalent to 



Hmc (6) 



for almost every Xq on this energy surface. In quantum mechanics, if we regard a 
pure state \ipt){ipt\ as analogous to the pure state 5x t and p mc as analogous to /i mc , the 
statement analogous to (|6]) reads 



\A)(A\= Pmc- (7) 



As pointed out by von Neumann [31], the left hand side always exists and can be 
computed as follows. Let {<p a } be an orthonormal basis of eigenvectors of H with 
eigenvalues E a . If ipo has coefficients c a = (4> a \ipo), 



D 



^o = ^c a |0 Q ), (8) 



then 



and thus 



D 



^t = Y, e ~ lEatc ^), (9) 



a=l 



\1>t)(A\ = ^e-^-^)* CaC *|0 Q )(0^| . (10) 

Suppose that H is non-degenerate; then E a — Ep vanishes only for a = (3, so the time 
averaged exponential is 6 a /3, and we have that 

=^|Ca| 2 |0a)(0«|- (11) 

a 

While the case ([7]) occurs only for those special wave functions that have \c a \ 2 = 1/D 
for all a, in many cases it is true of all initial wave functions ipo on the unit sphere of 
Jt? that \ipt)(4>t\ is macro scopically equivalent to p mc . 
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What we mean here by macroscopic equivalence corresponds in the work of von 
Neumann [21] to a decomposition of 3^ into mutually orthogonal subspaces 3^ v , 



je = @j%, (12) 



such that each Jfi, corresponds to a different macro-state v. We call the 3^ v the "macro- 
spaces" and write Ql for the family {3%l} of subspaces, called a "macro-observer" in von 
Neumann's paper, and P v for the projection to M^. We use the notation 

d„ = dim^t (13) 

(= s„, a in |3H, s.in |ZIII])Q 

As a simple example, we may consider, for a gas consisting of n > 10 20 atoms enclosed 
in a box A C M. 3 , the following 51 macro-spaces J#o, 3%, . . . , ^ioo : contains the 
quantum states for which the number of atoms in the left half of A lies between v — 1 
percent of n and u+1 percent of n. Note that in this example has the overwhelming 
majority of dimensions!! 

Given S>, we say that two density matrices p and p' are macro scopically equivalent, 
in symbols 

pZf/, (14) 

if and only if 

tx{pP v ) « tr^'P,,) (15) 

for all i/. (The sense of ~ will be made precise later.) For example, \ip){ip\ ~ p mc if and 
only if 

ll^ll 2 - % (16) 



1 Von Neumann motivated the decomposition (fT2"]) by beginning with a family of operators corre- 
sponding to coarse-grained macroscopic observables and arguing that by "rounding" the operators, the 
family can be converted to a family of operators Mi, . . . , Mk that commute with each other, have pure 
point spectrum, and have huge degrees of degeneracy. (This reasoning has inspired research about 
whether for given operators A\ , . . . , whose commutators are small one can find approximations 
Mi R3 Ai that commute exactly; the answer is, for k > 3 and general Ax, . . . , Ak, no [3].) A macro-state 
can then be characterized by a list v = (mi, . . . , nik) of eigenvalues m, of the Mi, and corresponds to 
the subspace Jtfl C Jtf containing the simultaneous eigenvectors of the Mj with eigenvalues m,; that 
is, 3^ v is the intersection of the respective eigenspaces of the Mi and d v is the degree of simultaneous 
degeneracy of the eigenvalues mi, . . . , m^. For a notion of macro-spaces that does not require that 
the corresponding macro-observables commute, see in particular Section 2.1.1. (Concerning the 
main results discussed below, Theorems 1 and 2, a plausible guess is that normal typicality extends 
to non-commuting families Ai, . . . , Ak — of observables that may also fail to commute with p mc — pro- 
vided that the observables have a sufficiently small variance in the sense of Lemma 1 below, i.e., that 
Var ((ip\A\ip)) be small. We shall however not elaborate on this here.) 

2 Actually, these subspaces form an orthogonal decomposition of J^totai rather than of the energy 
shell Jrff, since the operator of particle number in the left half of A fails to map Jff to itself. Thus, certain 
approximations that we do not want to describe here are necessary in order to obtain an orthogonal 
decomposition of J$?. 
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for all v. This is, in fact, the case for most vectors ip on the unit sphere of provided 
the d v are sufficiently large, as follows, see ([35]) . from the following easy geometrical 
fact, see e.g., J2H p. 55]; see also Appendix II of 



Lemma 1. If Ml is any fixed subspace of dimension d v and if is a random vector with 
uniform distribution on the unit sphere then 

E M = |, ^ M =K(l^ll"-|) I = i(|)^. (17) 

3 



Returning to the time average, we obtain that l^t)^! ~ p mc if and only if 

^|c a | 2 (0 Q |P,|0 a )^^ (18) 

for all v. Condition f lTHj) is satisfied for every G ^ with HV'oll = 1 if 

(0a|P,|0a> ~ ^ (19) 

for every a and i/, a condition on H and that von Neumann showed is typically 
obeyed, in a sense which we shall explain. The analogy between \ip t ){ip t \ ~ p mc and 
ergodicity lies in the fact that the time average of a pure state in a sense agrees with 
the micro-canonical ensemble, with the two differences that the agreement is only an 
approximate agreement on the macroscopic level, and that it typically holds for every, 
rather than almost every, pure state. 

However, even more is true for many quantum systems: Not just the time average 
but even \ip t )(ip t \ itself is macroscopically equivalent to p mc for most times t in the long 
run, i.e., 

\\PM\ 2 *jj (20) 

for all v for most t. Such a system, defined by H, ^0, and ipo, we call normal, a 
terminology inspired by the concept of a normal real number [IB]. Above we have 
stressed the continuity with the standard notion of ergodicity. Yet, normality is in part 
stronger than ergodicity (it involves no time-averaging) and in part weaker (it involves 
only macroscopic equivalence); in short, it is a different notion. 

Suppose now, as in the example between ( TT3T) and (fill) , that one of the macro-spaces, 
J4? u = Jifeq, has the overwhelming majority of dimensions, 

(21) 



It is then appropriate to call this macro-state the thermal equilibrium state and write 



v = eq. We say that the system is in thermal equilibrium at time t if and only if \\P eq ijj t u2 



is close to 1, or, put differently, if and only if 



iPeqM^^f- (22) 
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Thus, if a system is normal then it is in equilibrium most of the time. Of course, if it 
is not in equilibrium initially, the waiting time until it first reaches equilibrium is not 
specified, and may be longer than the present age of the universeH 

The case that one of the has the overwhelming majority of dimensions is an 
important special case but was actually not considered by von Neumann; it is discussed 
in detail in [10J. Von Neumann (and many other authors) had a different understanding 
of thermal equilibrium; he would have said a system is in thermal equilibrium at time 

t if and only if f l20|) holds for all v, so that \ip t )(ipt\ ~ Pmc- Here we disagree with 
him, as well as with his suggestion that the further theorem in [31], which he called the 
"quantum if -theorem" and which is a close cousin of the QET, is a quantum analog of 
Boltzmann's if-theorem. Yet other definitions of thermal equilibrium have been used in 
[221 HE] ; see Section 6 of JTU] for a comparative overview, and [12] for a broader overview 
of such definitions. 

The QET provides conditions under which a system is normal for every initial state 
vector ipQ. Note that statements about most initial state vectors ipo are much weaker; 
for example, most state vectors ipo are in thermal equilibrium by Lemma [H so a state- 
ment about most ip need not convey any information about systems starting out in 
non-equilibrium. Furthermore, the QET asserts normal typicality, i.e., that typical 
macroscopic systems are normal for every ipo, more precisely, that for most choices of @ 
(or H), macroscopic systems are normal for every ip . It thus provides reason to believe 
that macroscopic systems in practice are normal. 

Informal statement of the QET (for fully precise statements see TheoremsUWE below): 
Following von Neumann, we say that a Hamiltonian if with non-degenerate eigenvalues 
Ei, ... , Ed has no resonances if and only if 



In words, this means that also the energy differences are non- degenerate. Let M' be 
any Hilbert space of finite dimension D, and let if be a self-adjoint operator on ffl 
with no degeneracies and no resonances. If the natural numbers d v are sufficiently large 
(precise conditions will be given later) and = D, then most families S) = {Jrf? u } 

of mutually orthogonal subspaces with dimJ^, = d u are such that for every wave 
function ipo G M 3 with \\ipo\\ — 1 and every v, fTSOj) holds most of the time in the long 
run. 

When we say that a statement p(x) is true "for most x" we mean that 



where < 8 <C 1, and /i is a suitable probability measure; we will always use the 
appropriate uniform measure, as specified explicitly in Section |2J (When we speak of 

furthermore, due to the quasi-periodicity of the time-dependence of any density matrix (not just 
a pure one) of our system, it will keep keep on returning to (near) its initial state. 




(23) 



/j,{x\p(x)} >l-5, 



(24) 
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"most of the time in the long run" , the meaning is a bit more involved since there is no 
uniform probability measure on the half axis [0, oo); see Section 0) 

Let p{S>, ipo) be the statement that for every u, fl2DJ) holds most of the time in the 
long run. The misunderstanding of the QET starting in the 1950s consists of mixing up 
the statement 

for most @ : for all ip : p(<@, ip ) , (25) 
which is part of the QET, with the inequivalent statement 

for all ip : for most Ql : p(@, ip ) . (26) 

To see that these two statements are indeed inequivalent, let us illustrate the difference 
between "for most x: for all y: p(x,y)" and "for all y: for most x: p(x,y)" by two 
statements about a company: 

Most employees are never ill. (27) 

On each day, most employees are not ill. (28) 

Here, x ranges over employees, y over days, and p(x, y) is the statement "Employee x is 
not ill on day y." It is easy to understand that ( 12"T|) implies (128"]). and (128]) does not imply 
( 12"7|) . as there is the (very plausible) possibility that most employees are sometimes ill, 
but not on the same day. 

Von Neumann's proof establishes ( 125]) . while the proofs in [7J [T] establish only the 
weaker version ( 126]) . Von Neumann also made clear in a footnote on p. 58 of his article 
[31] which version he intended: 

Note that what we have shown is not that for every given if) or A the 
ergodic theorem and the if-theorem hold for most co\^, a , but instead that 
they hold universally for most u\ tUta , i.e., for all if) and A. The latter is of 
course much more than the former. 

Here, A is not important right now while corresponds to Q> in our notation. So the 
quotation means that what von Neumann has shown is not (126]) but ( 125]) for a certain 
p. 

The remainder of this paper is organized as follows. In Section [2] we make explicit 
which measures are used in the role of /i. In Section 2] we give the precise definition of 
normality. Section [5] contains a precise formulation of von Neumann's theorem and an 
outline of his proof. Section [6] contains our stronger version of the QET with tighter 
bounds on the deviations. In Section [7] we show that the versions of the QET in [7] [T] 
differ from the original as described above. In Section |H] we provide another version of 
the QET, assuming typical H instead of typical 3>. Finally, in Section M we compare 
von Neumann's result with recent literature. 
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2 Measures of "Most 



Let us specify which measure \i is intended in fj24j) when referring to most wave functions, 
most unitary matrices, most orthonormal bases, most Hamiltonians, most subspaces, or 
most decompositions S. It is always the appropriate uniform probability measure. 

For wave functions ip, \i is the (normalized, (2D — l)-dimensional) surface area 
measure on the unit sphere in Hilbert space Jff. 

For unitary matrices U = (U a p), the uniform probability distribution over the unitary 
group U(D) is known as the Haar measure. It is the unique normalized measure that 
is invariant under multiplication (either from the left or from the right) by any fixed 
unitary matrix. 

For orthonormal bases, the Haar measure defines a probability distribution (the 
uniform distribution) over the set of orthonormal bases of J^ 7 , ONB(Jif), as follows. 
Fix first some orthonormal basis 4>i, ■ ■ ■ ,4>d for reference. Any other orthonormal basis 
Ux, . . . ,Ud can be expanded into the (ftp , 



where the coefficients U a p form a unitary matrix. Conversely, for any given unitary 
matrix U = (U a p), (|29|) defines an orthonormal basis; thus, a random Haar-distributed 
U defines a random orthonormal basis (u) a ), whose distribution we call the uniform 
distribution. It is independent of the choice of the reference basis (p because the Haar 
measure is invariant under right multiplication by a fixed unitary matrix. Note also 
that the marginal distribution of any single basis vector u a is the uniform distribution 
on the unit sphere in Jf. 

For Hamiltonians, we will regard the eigenvalues as fixed and consider the uniform 
measure for its eigenbasis. This is the same distribution as that of H = UHqU~ 1 when 
U has uniform distribution and Hq is fixed. 

For subspaces, we will regard the dimension d as fixed; the measure over all sub- 
spaces of dimension d arises from the measure on ONB(Jif) as follows. If the random 
orthonormal basis uji, . . . , ojd has uniform distribution, we consider the random subspace 
spanned by cj%, . . . , cu^ and call its distribution uniform. 

For decompositions 3> = {J$? u }, we will regard the number N of subspaces as fixed, 
as well as their dimensions d v \ the measure over decompositions arises from the measure 
on ONB(J^) as follows. Given the orthonormal basis U\, . . . , ud, we let M'v be the 
subspace spanned by those uj a with a G J u , where the index sets J u form a partition of 
{1, . . . , D} with j^J v = d v \ we also regard the index sets J u as fixed. 

The Haar measure is also invariant under the inversion U (->■ C/ _1 . A consequence is 
what we will call the "unitary inversion trick" : If is any fixed orthonormal basis and 
uj a random orthonormal basis with uniform distribution then the joint distribution of 
the coefficients U a p = (4>p\oj a ) is the same as if uj were any fixed orthonormal basis and 
random with uniform distribution. The reason is that in the former case the matrix 
U is Haar-distributed, and in the latter case U~ l is Haar-distributed, which yields the 



D 




(29) 
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same distribution of U. As a special case, considering only one of the oo a and calling 
it ip, we obtain that if <ft is any fixed orthonormal basis and ip a random vector with 
uniform distribution then the joint distribution of the coefficients is the same as 

if ip were any fixed unit vector and random with uniform distribution. 

The concept of "most times" is a little more involved because it involves a limiting 
procedure. Let 8' > be given; we say that a statement p(t) holds for (1 — 5') -most t 
(in the long run) if and only if 



where \M\ denotes the size (Lebesgue measure) of the set M C R. (So this concept of 
"most" does not directly correspond to a probability distribution.) 



We would like to clarify the status of statements about "most" *3l (or, for that matter, 
most H or most ipo), and in so doing elaborate on von Neumann's method of appeal to 
typicality. In 1955, Fierz criticized this method as follows [8j p. 711] u 

The physical justification of the hypothesis [that all observers are equally 
probable] is of course questionable, as the assumption of equal probability for 
all observers is entirely without reason. Not every macroscopic observable in 
the sense of von Neumann will really be measurable. Moreover, the observer 
will try to measure exactly those quantities which appear characteristic of a 
given system. 

In the same vein, Pauli wrote in a private letter to Fierz in 1956 j2U] : 

As far as assumption B [that all observers are equally probable] is con- 
cerned [. . . ] I consider it now not only as lacking in plausibility, but non- 
sense. 

Concerning these objections, we first note that it is surely informative that normality 
holds for some let alone that it holds in fact for most @s, with "most" understood 
in a mathematically natural way. But we believe that more should be said. 

When employing the method of appeal to typicality, one usually uses the language 
of probability theory. When we do so we do not mean to imply that any of the objects 
considered is random in reality. What we mean is that certain sets (of wave functions, 
of orthonormal bases, etc.) have certain sizes (e.g., close to 1) in terms of certain natural 
measures of size. That is, we describe the behavior that is typical of wave functions, 
orthonormal bases, etc.. However, since the mathematics is equivalent to that of proba- 
bility theory, it is convenient to adopt that language. For this reason, we do not mean, 

4 This quotation was translated from the German by R. Tumulka. 




(30) 



3 The Method of Appeal to Typicality 
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when using a normalized measure /i, to make an "assumption of a priori probabilities," 
even if we use the word "probability." Rather, we have in mind that, if a condition is 
true of most or most H, this fact may suggest that the condition is also true of a 
concrete given system, unless we have reasons to expect otherwise. 

Of course, a theorem saying that a condition is true of the vast majority of systems 
does not prove anything about a concrete given system; if we want to know for sure 
whether a given system is normal for every initial wave function, we need to check the 
relevant condition, which is (l44p below. Nevertheless, a typicality theorem is, as we have 
suggested, illuminating; at the very least, it is certainly useful to know which behaviour 
is typical and which is exceptional. Note also that the terminology of calling a system 
"typical" or "atypical" might easily lead us to wrongly conclude that an atypical system 
will not be normal. A given system may have some properties that are atypical and 
nevertheless satisfy the condition ( T4"4"|) implying that the system is normal for every 
initial wave function. 

The method of appeal to typicality belongs to a long tradition in physics, which 
includes also Wigner's work on random matrices of the 50s. In the words of Wigner 



One [. . . ] deals with a specific system, with its proper (though in many 
cases unknown) Hamiltonian, yet pretends that one deals with a multitude 
of systems, all with their own Hamiltonians, and averages over the properties 
of these systems. Evidently, such a procedure can be meaningful only if it 
turns out that the properties in which one is interested are the same for the 
vast majority of the admissible Hamiltonians. 

This method was used by Wigner to obtain specific new and surprising predictions about 
detailed properties of complex quantum systems in nuclear physics. Here the method of 
appeal to typicality is used to establish much less, viz., approach to thermal equilibrium. 

4 Bounds on Deviations 

Two different definitions of normality are relevant to our discussion. Consider a system 
for which J^, H, 3>, and ipo are given. Let N denote the number of macro-spaces J4° u , 
and let e > and 5' > also be given. 

Definition 1. The system is e -5' '-normal in von Neumann's fgi] / sense if and only if, 
for (1 — 5')-most t in the long run, 



5 Let us connect this to how von Neumann formulated the property considered in the QET, which 



132J 




(31) 
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Definition 2. The system is e-b' -normal in the strong sense if and only if, for (1 — 5')- 
most t in the long run, 

< e% (35) 



n / II 2 d v 



for all v. 



In the cases considered by von Neumann ( 1351) is a much stronger inequality than ( 13TT) . 
The motivation for considering (|35|) is twofold. On the one hand, Lemma [I] implies that 
for most wave functions ip, the deviation of ||P„y>|| 2 from d v /D is actually smaller than 
d v /D. (Indeed, the Chebyshev inequality yields for X = ||P„<^|| 2 that 

/, , / , d„\ VarX 1 

,( } X- dl ,ID\<^)>l- T -^- f >_!-—, (36) 

which tends to 1 as d v — > oo.) On the other hand, strong normality means that ||P/0 t || 2 
actually is close to d u /D, as the relative error is small. In contrast, the bound in 
( 13T1) is greater than the value to be approximated, and so would not justify the claim 
\\PM\ 2 « d v /D. 

The basic (trivial) observation about normality is this: 

Lemma 2. For arbitrary 34f, H, 3>, ipo with \\ipo\\ = 1 ond any e > and 8' > 0, if 



d 



G = G(H,9,^) ,v) := HP^H 2 -^ < e =\ boundi (37) 



D 



d v 5' 



NDN 



for every v then the system is e-5' -normal in von Neumann's sense. If 

G < e 2 ^^- =: bound 2 (3S 
for every v then the system is e-5' -normal in the strong sense. 



is: for (1 — <5')-most t in the long run, 



\(A\A\^t) ~ tr A/D\ < e^tr(A 2 )/D (32) 

for every real-linear combination ( "macro-observable" ) A = a u P u . The quantity tr A/D = tr(p mc A) 
is the micro-canonical average of the observable A. The quantity \Jtv{A 2 )/ D = y/ 'tr(p mc A 2 ) was 
suggested by von Neumann as a measure of the magnitude of the observable A in the micro-canonical 
average. To see that (|3"2"j) is more or less equivalent to (|3"Tj) . note first that (|32p implies, by setting one 
a v = 1 and all others to zero, that 



\\\PM\ 2 - d v /D\ < ey/du/D. (33) 



This is only slightly weaker than (|3"Tj) , namely by a factor of yN , when N is much smaller than D /d v , 
as would be the case for the considered by von Neumann. Conversely, (|3"Tj) for every v implies (l3"2"j) 
for every A: This follows from 



a consequence of the Cauchy-Schwarz inequality, by setting x v = a v £\J d u /ND. 



11 



Proof. If a non- negative quantity f(t) (such as the | ■ • • | 2 above) is greater than or equal 
to a := e 2 d v /ND > for more than the fraction b := 5'/N > of the time interval 
[0, T] then its average over [0,T] must be greater than ab. By assumption ( 157)) . this is 
not the case for any v when T is sufficiently large. But | • • • | 2 > a means violating (l3Tj) . 
Therefore, for sufficiently large T, the fraction of the time when (15T)) is violated for any 
v is no greater than 8'; thus, ( 130)) holds with p(t) given by W : (15T)) . 

In the same way one obtains (155)) from (138 p . □ 



5 Von Neumann's QET 

We now describe von Neumann's result. To evaluate the expression G, let 1; . . . , 0£> 
be an orthonormal basis of consisting of eigenvectors of the Hamiltonian H with 
eigenvalues Ei, . . . , Ed, and expand ipo in that basis: 

D D 

^o = ^c Q Q , ip t = ^e- iE - t c a (j) a . (39) 

a=l a=l 

Inserting this into G and multiplying out the square, one obtains 



- 2^Re^e^-^Mc,(0 a |P,|^) + i| . (40) 

If H is non-degenerate then i£ a — Ep vanishes only for a = (3, so the time averaged 
exponential in the last line is 8 a p. Furthermore, if H has no resonances then the time 
averaged exponential in the first line of (j4"01) becomes Saa'bpp' + SapSa'P' — baa'bpp'bap, 
and we have that 

G = ^|c a | 2 |c /3 | 2 [|(0jP,|0 /3 )|%(0jP,|0 Q )(0 /3 |P,|0 /3 )^ 

a,/3 ^ ' 

" Yl \^\ 4 (^\Pu\^ a ) 2 - 2^ Yl K\ 2 (<Pa\PA4>a) + j£ (41) 
a a 

= E I^Hc/jH^l^l^r + (E l C a| 2 (^l^l^) - (42) 
1 2 / (J \ 2 

< max (0 a |P„|<^9) + max (0 a |P^|0 a ) - (43) 

1 ' a \ JJ J 

using ^2 \c a \ 2 = 1. This calculation proves the following. 

Lemma 3. For arbitrary J$? and 9), for any H without degeneracies and resonances, 
and for any e > and 5' > 0, if, for every v , 

1 2 / d \ ^ 

max \(<f) a | P„\(f)p)\ + max (4> a \P u \(f> a ) - ) < boundi, 2 (44) 

a \ JJ J 
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then, for every ip G with \\ipo\\ = 1, the system is e-b 1 -normal in von Neumann's 
sense respectively in the strong sense. 

Note that every initial wave function behaves normally, provided H and together 
satisfy the condition (j44]). Now von Neumann's QET asserts that for any given H and 
any suitable given values of the d u , most @ will satisfy fj44|) . It is convenient to think 
of 3) as arising from a uniformly distributed ortho normal basis ui, . . . ,ud in the sense 
that J^ u is spanned by those u a with a G J u , as described in Section [2J The coefficients 
U a /3 = (cj)p\uj a ) of oj a relative to the eigenbasis of H then form a Haar-distributed unitary 
matrix, and 

(<Pa\Pu\h) = ^(0 Q |W 7 )(W 7 |0,3) = U l«{U lP Y ■ (45) 

Let log denote the natural logarithm. 

Lemma 4. (von Neumann 1929) There is a (big) constant G\ > 1 such that whenever 
the two natural numbers D and d v satisfy 

C^ogD <d v <^, (46) 

and U is a Haar-distributed random unitary D x D matrix, then 



E 



7=1 



2 log£> 



max . EMOr/O* < ^JT » ( 47 ) 



7=1 



To express that /i{x|p(s)} > 1 — 5, we also say that p(x) holds for (1 — SY-rnost x. 
Putting together Lemma |3] (for boundi) and Lemma HI we have the following^ 

Theorem 1. (von Neumann's QET, 1929) Let e > 0, 5 > 0, and 5' > 0. Suppose the 
numbers D, N , and d%, . . . , d^ are such that d\ + . . . + d^ = D and, for all v, 

iox 2 - 



max (Ci , 



, } \ogD<d v <D/C x , (49) 

where C\ is the constant of Lemma 0. For arbitrary J$? of dimension D and any H 
without degeneracies and resonances, (l-S)-most orthogonal decompositions 3 = {J4? u } 
of J{? with dim Jif u = d v are such that for every wave function ip G Jif with \\ipo\\ = 1 
the system is e -5 '-normal in von Neumann's sense. 

Proof. Regard @ as random with uniform distribution and let X be the left hand side 
of (jSl). Using it follows from Lemma H that EX < 10 log £>/£>. B Y Markov's 

inequality, 

EX lOlogD r , s 

P X > bound! < r J" < TTTT^-r < 5 > 50 ) 

boundi D boundi 

using (|49p again. Theorem [1] then follows from Lemma [3j □ 



D For clarity we have modified von Neumann's statement a bit. 
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6 Strong Version 



It is an unsatisfactory feature of the QET that all d v are assumed to be much smaller 
(by at least a factor Ci) than D, an assumption excluding that one of the macro-states 
v corresponds to thermal equilibrium. However, this assumption can be removed, and 
even the strong sense of normality can be concluded. An inspection of von Neumann's 
proof of Lemma H] reveals that it actually proves the following. 

Lemma 5. (von Neumann 1929) There is a (big) constant C 2 > 1 such that whenever 
the two natural numbers D and d v satisfy 

C 2 <d u <D-C 2 , (51) 

and U is a Haar- distributed random unitary D x D matrix then, for every < a < 
d 2 jD 2 C 2 , 



( D 2 
max VfJ 7a (U f )' >a) < — expf-4a(D-l)) , (52) 
7=1 

r(mUj2\U^-^-) 2 >a) <^=^ rexpf-e^V (53) 



2 



with 6 = 1 — _ -r- . 

From this we can obtain, with Lemma El the following stronger version of the QET, 
which von Neumann did not mention. 

Theorem 2. Theorem^ remains valid if one replaces "normal in von Neumann's sense" 
by "normal in the strong sense" and (|49p by 

max(c 2 , y/(3N/e 2 5')D\ogD^ < d v < D - C 2 , (54) 

e 2 5' < 2N/C 2 , DJ log D > 100N/e 2 5' , and D > 1/5 , (55) 
where C 2 is the constant of Lemma [5j 

Proof. Set a = bound 2 /2 = (s 2 5' /2N)(d u / ' Df in (J52J and (|53J. The first assumption in 
( 15"5|) ensures that the condition a < d 2 /D 2 C 2 in Lemma El is satisfied. The assumption 
(154)1 includes 

<%> (3N/e?5')D\ogD (56) 
> {N/e 2 5')D{2 log D - log 5) (57) 

using log/} > —log 5 from the third assumption in (|55|) . Now (|57j) implies that 
4a(D - 1) > 2a£> > 21ogZ> — log 5, so that the right hand side of (j32J) is less than 5/2. 
Furthermore, from the second assumption in (1551) we have that 1 > lOOiV dog D / e 2 5' D , 
which yields with ([SE]) that d 2 , > (300A^ 2 /e 4 5' 2 ) log 2 L>, and thus 4 > (16N/Qe 2 5') \ogD, 
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using 6 > 16/ V300 (which follows from C 2 > 121). Because of logD > — log 5, we have 
that 

d u > (AN/ee 2 5') (3 log D - log 5) , (58) 

which implies that QD 2 a/2d u = ®{e 2 8' / AN)d u > 3\ogD — log 5, so also the right hand 
side of (153]) is less than 5/2. Thus, (jilj) is fulfilled for bound2 with probability at least 
1 - 5. □ 



The stronger conclusion requires the strong assumption that yf D log D <C d v whereas 
von Neumann's version needed log.D <^d v CD. 

Concerning a thermal equilibrium macro-state with d eq /D > 1 — e, Theorem [2] 
provides conditions under which most subspaces J^ eq of dimension d eq are such that, for 
every tp Q 6 with HV'oll = 1> the system will be in thermal equilibrium for most times. 
More precisely, Theorem [2] implies the following: Let e > 0, 5 > 0, and 5' > 0. Suppose 
that the number D is so big that (|55|) holds with N = 2, and that d eq is such that 

1 - e < ^ < 1 , (59) 

max(^C 2 , V '(6 /e 2 5')D log L>) < d eq < D — max(c 2 , v / (6/£ 2 5 , ) J DlogD) . (60) 

For arbitrary M' of dimension D and any Hamiltonian H without degeneracies and 
resonances, (1 — 5)-most subspaces Jif eq of M' with dimJ#i q = d eq are such that for 
every wave function ipo G Jf? with \\ipo\\ — 1; the relation 

\\PeM 2 > l~2e (61) 

/io/ds /or (1 — 5')-most t. In this statement, however, the conditions can be relaxed (in 
particular, H may have resonances, and the upper bound on d eq in (1601 can be replaced 
with D), and the statement can be obtained through a proof that is much simpler than 
von Neumann's; see |10j . 



7 Misrepresentations 

We now show that the statements presented as the QET in [TJ [1] differ from the original 
theorem (in fact in inequivalent ways) and are dynamically vacuous. 

It is helpful to introduce the symbol \j/ to denote "for most." It can be regarded as 
a quantifier like the standard symbols V (for all) and 3 (for at least one). So, if p(x) 
is a statement containing the free variable x then we write \j/a; : p(x) when we mean 
n{x\p(x)} > 1 — S, assuming that it is clear from the context which measure /x and which 
magnitude of 5 are intended. With this notation, the misunderstanding as described 
in ( 126]) versus ( 125]) can be expressed by saying that the quantifiers and \/y do not 
commute: 

\|/ xiy : p{x , y) \/y\j x : p(x, y) . (62) 
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The two expressions are not equivalent. Indeed, the set of x's (whose measure is close 
to 1) is allowed to depend on y if the quantifiers are of the form Wy\/x but not if they 
are of the form \j/xV?/. That is, if they are of the form \|/ 'x\/y then there exists a set M 
of x's, not depending on y, with /i x (M) > 1 — 5 such that Vx G MWy : p(x, y). Thus the 
first expression in (I62I) is stronger than the second: 

Y x\/y : p(x, y) =>■ Wy^x : p(x, y) . (63) 

This should be contrasted with situations in which quantifiers do commute, for ex- 
ample WxWy <=>■ WyWx and ^/xtfy <^ \|/y\J/x (though the bound 5 on the exceptions may 
become worsd3). An exceptional case, in which \/x and do commute, occurs when 
the variable y assumes only a very limited number n (e.g., n = 10) of possible values: 
Then Vytyx : p implies ^xiy : p with, however, the bound 5 on the exceptions worse 
by a factor of n, 5 — > n<5. In our case, however, y = ipo varies in an infinite set. 

In this symbolic notation, and leaving out some details, Theorems [1] and [2] can be 
paraphrased as: 

V#V^W>oV*V^: ||P^tir~4/£>, (66) 

where Vif should be taken to mean "for all Hamiltonians without degeneracies and 
resonances," and ~ should be understood either in the wide sense of (13~T1) for Theorem [U 
or in the sense of (|35|) for Theorem [2j Let us now look at what [7J [1] write. 

We focus first on the article of Bocchieri and Loinger pp. As we show presently, 
their version of the QET has a different order of quantifiers, with fatal consequences. 
It also differs in a second way from the original as it deals with the strong sense of 
normality instead of von Neumann's sense; this, of course, is a strengthening of von 
Neumann's statement. Finally, their version drops von Neumann's hypotheses on the 
Hamiltonian (no degeneracy, no resonance); this, of course, is a difference that Boc- 
chieri and Loinger were aware of and emphasized as evidence that von Neumann made 
unnecessary hypotheses. 

Indeed, in |TJ, the statement "These relations constitute von Neumann's ergodic 
theorem" (p. 670) is preceded by their Eq. (13), which in our notation reads 



4 E ||Pj/0t|| 2 — d v /D 2 



e||p^|I 2 = ^; - - «1, (67) 



D ' (PJD 



7 More precisely, if 
then, for every e x > 0, 



^x{x\ny{y\p{x, y)} > 1 - Sy} >1-8 X (64) 
V v {y\lJ. x {x\p(x, y)} > 1 - e x ) > 1 - e y (65) 



with e y > (S x + S y — 5 x 5 y )/e x . (For example, (|55|) holds for e x = e y — y/8 x + S y .) To see this, note 
that (|64| implies that, relative to the product measure y, x ® fly, at least the fraction (1 — S x )(l — S y ) of 
all pairs (x, y) satisfies p{x 1 y)\ thus, 



Vy(dy) Hx{x\p(x,y)} = fi x (g) fi y {(x,y)\p(x,y)} > 1 - (5 X + 8 y - S x S y ) , 
and this implies (|65j) . 
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where the average E is taken over *3) relative to the uniform distribution]^ From this 
it follows that for all it is true for most @ that ||P^^|| 2 « d u /D for most t, with 
deviation small compared to d v /D. Moreover, as (j6"7|) holds for all H, and, via (|38p . 
the conclusion can be shown to hold simultaneously for all is, the version of pQ can be 
written, in analogy to (|66|) . as 

ViJV^o \/3> ^tVis : \\PuM 2 ~ 4/£>- (68) 

This statement is not only inequivalent to von Neumann's, it is also dynamically 
vacuous. By this we mean that it follows from a statement that does not refer to any 
time other than 0. Indeed, the relations (1671) are proved in [I] by first proving for any 
fixed ^thal@ 

MPM =-p5 -^jjyi L <1, (69) 

which is (167|) without the procedure of time averaging, then setting ip = ip t and taking 
the time average on both relations, and finally commuting the time average and the 
average E over 2>, which is always allowed by Fubini's theorem. In the notation using 
the symbol \J/, f )69|) yields 

V</> V^Vz/ : ||P4>|| 2 « d u /D. (70) 

This fact is the non-dynamical reason why f )68|) is true: Since f lTUj) applies to every if), 
it applies in particular to ip t for any H, ip , and t. That is, f JTOj) implies 

VifV^oVt V^Vz/ : ||P^ t || 2 « d„/Z>, (71) 

and since Vt =>" V* =^ (ED implies (JBS]). Thus, fH} is dynamically 



vacuous. This fact was essentially the criticism put forward against the QET in [T 

We turn to the article of Farquhar and Landsberg [7]. As we show presently, their 
version of the QET differs from the original in the same ways as the version of PQ, as 
well as in that it concerns only the time average of HP^^H 2 , while the original QET 
concerns the value of HP^J 2 for most t. 



8 More precisely, their proof shows that for every r\ > and every H, if every d v > 1/rj then, for all 

V>o and is, E|||P^ t || 2 - d u /D\ 2 < ^d 2 jD 2 . 

9 In fact, these expectation values are independent of ip, by the invariance of the Haar measure. 
10 The exact nature of the criticism, though, remained a bit unclear in p], as Bocchieri and Loinger 
did not make explicit what it means for a statement to be dynamically vacuous. They pointed out 
that ([1)7]) is valid for every Hamiltonian, including H = 0, and that the proof of (|57|) by means of 
(|69|) did not, in fact, require that ipt — exp(—iHt)tp n , but only that ip t — /t(V'o) for an arbitrary 
measure-preserving mapping f t from the unit sphere to itself. These facts strongly suggest that (|67[) is 
dynamically vacuous, but should per se not be regarded as a proof; for example, the Poincare recurrence 
theorem ;22. is valid for every Hamiltonian, or in fact for every measure-preserving flow f t on the unit 
sphere in a finite-dimensional Hilbert space, but clearly has dynamical content. That is why we defined 
a "dynamically vacuous statement" to be a logical consequence of a statement that does not refer to 
time. 
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Indeed, the result on which their version of the QET is based is expressed in their 
Eq. (2.17), which holds for every H and D > 3 and reads in our notation as 

I 2 



E ||P„^|| 2 — d v /D\ 2(D-d 



d?jD< 



< 



For large d v , this yields 



and thus 



E \\P u ^ t f-d v /D\ 



d? v /D 2 



d v D 



«1, 



(72) 



(73) 



VHV^o V^Vz/ : \\P v i) t \\ 2 ~ d u /D. (74) 

This result concerns only the time average of ||Pj,f/> t || 2 but provides no control over the 
time variance, and so does not inform us about the behavior for most t. Moreover, ( JT4"j) 
has the wrong order of quantifiers. Finally, since f!73|) follows from the inequality in ()6j 



using f{t) < fit) 2 , it is a logical consequence of a dynamically vacuous statement, and 
thus is itself dynamically vacuous. 



8 Typical Hamiltonian 

Normality for most Sts is more or less equivalent to normality for most if s. Indeed, 
by the "unitary inversion trick" described in Section [2J one can trade the typicality 
assumption on *3i in the QET for a typicality assumption on H, without any essential 
modification of the proof. This is because the relevant condition (jUJ) involves only 

((f>a\Pu\<f>p) = £(0aK)(u; 7 |<^) > ( 75 ) 

where we can either regard <fi as fixed and u as random (as von Neumann did) or vice 
versa. With this change, the (strong) QET reads as follows. 

Theorem 3. Let e > 0, 5' > 0, and 5 > 0. Suppose the numbers D, N, and d\ + . . . + 
d^ = D satisfy (|54p and fl55l) . Suppose further that the real numbers Ei, . . . , Ed are all 
distinct and have no resonances as defined in (T23j) . For arbitrary ffl of dimension D 
and any orthogonal decomposition S> = {J^ v } with dim^, = d u , (1 —5)-most operators 
H with eigenvalues Ei, . . . ,E D are such that for every wave function ip G with 
H^oll — 1 the system is e-b' -normal in the strong sense. 

This means, in the notation of fl66l) . that 

V^V#W>o V*Vz/ : \\PM 2 ~ d v /D. (76) 

It would be nice also to have a similar theorem asserting that normality for all ipo is 
typical even within a smaller class of Hamiltonians, say those of the form 

- h 2 V 2 - 

H = - E + £ + £ v ^ - ^ ■ w 

i=l i=l i^j 
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where the pair potential V is allowed to be any function from a suitable class. Here, n 
denotes the number of particles, Xj G M 3 the coordinate of particle i, V$ the derivative 
relative to x iy the mass of particle i, and U the external potential. However, such a 
theorem seems presently out of reach. 

As a corollary of (!76|) . one obtains for v = eq that 

WJ^ eq V# W> V* : ll^f « 1 , (78) 

where V^^g should be taken to mean "for all subspaces J^ eq of dimension d eq " (which is 
greater than (l — e')D). In fact, this conclusion remains true [10J under weaker technical 
assumptions (H may have resonances, and ( 15^|) can be replaced by (1 — e')D < d eq < D). 

As a corollary of (ITS]) , for a typical Hamiltonian every energy eigenfunction is in 
thermal equilibrium, i.e., close to J^q. (This statement could, of course, be obtained 
more directly: The condition that every energy eigenfunction is in equilibrium is a 
special case, for v = eq, of the condition (<j) a \Pv\(j) a ) ~ d u /D for all a, which is part of 
condition f )44|) . which by Lemma H] is typically obeyed.) 

We can be a bit more general than either Theorem [2] or Theorem [3] and say that what 
is needed to obtain strong normality is that the unitary matrix U a p = (<j)p\oj a ) relating 
the energy eigenbasis <pp to a basis u a aligned with Ql be like most unitary matrices 
in that they satisfy OS]). This means, more or less, that the energy eigenbasis and Q) 
should be unrelated. By the way, this is connected to the reason why ffl was physically 
interpreted as a "micro-canonical" space, i.e., one corresponding to an "energy shell": 
For a more comprehensive Hilbert space including states of macroscopically different 
energies, the energy eigenbasis and *3) would no longer be unrelated. Indeed, a sufficiently 
coarse-grained version of the Hamiltonian should be among the macroscopic observables 
and thus be diagonal in the co a basis. 

9 Comparison with Recent Literature 

The results of [281 123 [16] also concern conditions under which a quantum system will 
spend most of the time in "thermal equilibrium." For the sake of comparison, their 
results, as well as von Neumann's, can be described in a unified way as follows. Let us 
say that a system with initial wave function ?/>(0) equilibrates relative to a class srf of 
observables if for most times r, 

(V>(t)|A|V(t)) w Tr(\*jj(t))(^(t)\A\ for all A G a/. (79) 

We then say that the system thermalizes relative to srf if it equilibrates and, moreover, 

Tr (\iP(t))(iP(t)\A) w Tr (p mc A) for all A G si , (80) 

with p mc the micro-canonical density matrix (in our notation, 1/D times the projection 
P to Jif). With these definitions, the results of [2EJ E5j [16] can be formulated by saying 
that, under suitable hypotheses on H and ^(0) and for large enough D, a system will 
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equilibrate, or even thermalize, relative to a suitable class s$ ' . Von Neumann's quantum 
ergodic theorem establishes thermalization for a family s$ of commuting observables, 
the algebra generated by {Mi, . . . , M k } in the notation of Section [TJ 

Tasaki [28] as well as Linden, Popescu, Short, and Winter [16J consider a system 
coupled to a heat bath, ^totai = <8> ^Lth, and take srf to contain all operators 
of the form A sys <g> lbath- Tasaki considers a rather special class of Hamiltonians and 
establishes thermalization assuming that 

max|(0 Q |7/;(O))| 2 < 1, (81) 

a 

a condition that implies that many eigenstates of H contribute to tp(0) appreciably and 
that can (more or less) equivalently be rewritten as 

^|(0 a |^(O))| 4 «l. (82) 

a 

Under the assumption (182]) on ip(0), Linden et al. establish equilibration for H satisfying 
(J23]). They also establish a result in the direction of thermalization under the additional 
hypothesis that the dimension of the energy shell of the bath is much greater than 
dim J^ ys . 

Reimann's mathematical result [25] can be described in the above scheme as follows. 
Let £/ be the set of all observables A with (possibly degenerate) eigenvalues between 
and 1 such that the absolute difference between any two eigenvalues is at least (say) 
10 -1000 . He establishes equilibration for H satisfying (I23p . assuming that ^(0) satisfies 
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